Evaluation of re-ranking by prioritizing highly ranked documents in spoken term detection
نویسندگان
چکیده
In spoken term detection, the detection of out-of-vocabulary (OOV) query terms is very important because of the high probability of OOV query terms occurring. This paper proposes a re-ranking method for improving the detection accuracy for OOV query terms after extracting candidate sections by conventional method. The candidate sections are ranked by using dynamic time warping to match the query terms to all available spoken documents. Because highly ranked candidate sections are usually reliable and users are assumed to input query terms that are specific to and appear frequently in the target documents, we prioritize candidate sections contained in highly ranked documents by adjusting the matching score. Experiments were conducted to evaluate the performance of the proposed method, using open test collections for the SpokenDoc-2 task in the NTCIR-10 workshop. Results showed that the mean average precision (MAP) was improved more than 7.0 points by the proposed method for the two test sets. Also, the proposed method was applied to the results obtained by other participants in the workshop, in which the MAP was improved by more than 6 points in all cases. This demonstrated the effectiveness of the proposed method.
منابع مشابه
An IWAPU STD System for OOV Query Terms and Spoken Queries
We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models[1][2]. In this paper, we describe two methods for text OOV query terms and spoken queries. For text OOV query terms, we introduce four unique methods. First...
متن کاملEfficient Interactive Retrieval of Spo Ranked by Reinforcem
Unlike written documents, spoken documents are difficult to display on the screen; it is also difficult for users to browse these documents during retrieval. It has been proposed recently to use interactive multi-modal dialogues to help the user navigate through a spoken document archive to retrieve the desired documents. This interaction is based on a topic hierarchy constructed by the key ter...
متن کاملRe-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on ...
متن کاملCircular Re-ranking for Visual Search
Conventional approaches to visual search re-ranking empirically take the “classification performance” as the optimization objective, in which each visual document is determined relevant or not, followed by a process of increasing the order of relevant documents. First show that the classification performance fails to produce a globally optimal ranked list, and then formulate re-ranking as an op...
متن کاملAn STD System for OOV Query Terms Integrating Multiple STD Results of Various Subword units
We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models. In the proposed method, subword-based ASR (Automatic Speech Recognition) is performed for all spoken documents and subword recognition results are generate...
متن کامل